78 research outputs found

    Application of pre-training and fine-tuning AI models to machine translation: a case study of multilingual text classification in Baidu

    Get PDF
    With the development of international information technology, we are producing a huge amount of information all the time. The processing ability of information in various languages is gradually replacing information and becoming a rarer resource. How to obtain the most effective information in such a large and complex amount of multilingual textual information is a major goal of multilingual information processing. Multilingual text classification helps users to break the language barrier and accurately locate the required information and triage information. At the same time, the rapid development of the Internet has accelerated the communication among users of various languages, giving rise to a large number of multilingual texts, such as book and movie reviews, online chats, product introductions and other forms, which contain a large amount of valuable implicit information and urgently need automated tools to categorize and process those multilingual texts. This work describes the Natural Language Process (NLP) sub-task known as Multilingual Text Classification (MTC) performed within the context of Baidu, a Chinese leading AI company with a strong Internet base, whose NLP division led the industry in deep learning technology to go online in Machine Translation (MT) and search. Multilingual text classification is an important module in NLP machine translation and a basic module in NLP tasks. It can be applied to many fields, such as Fake Reviews Detection, News Headlines Categories Classification, Analysis of positive and negative reviews and so on. In the following work, we will first define the AI model paradigm of 'pre-training and fine-tuning' in deep learning in the Baidu NLP department. Then investigated the application scenarios of multilingual text classification. Most of the text classification systems currently available in the Chinese market are designed for a single language, such as Alibaba's text classification system. If users need to classify texts of the same category in multiple languages, they need to train multiple single text classification systems and then classify them one by one. However, many internationalized products do not have a single text language, such as AliExpress cross-border e-commerce business, Airbnb B&B business, etc. Industry needs to understand and classify users’ reviews in various languages, and have conducted in-depth statistics and marketing strategy development, and multilingual text classification is particularly important in this scenario. Therefore, we focus on interpreting the methodology of multilingual text classification model of machine translation in Baidu NLP department, and capture sets of multilingual data of reviews, news headlines and other data for manual classification and labeling, use the labeling results for fine-tuning of multilingual text classification model, and output the quality evaluation data of Baidu multilingual text classification model after fine-tuning. We will discuss if the pre-training and fine-tuning of the large model can substantially improve the quality and performance of multilingual text classification. Finally, based on the machine translation-multilingual text classification model, we derive the application method of pre-training and fine-tuning paradigm in the current cutting-edge deep learning AI model under the NLP system and verify the generality and cutting-edge of the pre-training and fine-tuning paradigm in the deep learning-intelligent search field.Com o desenvolvimento da tecnologia de informação internacional, estamos sempre a produzir uma enorme quantidade de informação e o recurso mais escasso já não é a informação, mas a capacidade de processar informação em cada língua. A maior parte da informação multilingue é expressa sob a forma de texto. Como obter a informação mais eficaz numa quantidade tão considerável e complexa de informação textual multilingue é um dos principais objetivos do processamento de informação multilingue. A classificação de texto multilingue ajuda os utilizadores a quebrar a barreira linguística e a localizar com precisão a informação necessária e a classificá-la. Ao mesmo tempo, o rápido desenvolvimento da Internet acelerou a comunicação entre utilizadores de várias línguas, dando origem a um grande número de textos multilingues, tais como críticas de livros e filmes, chats, introduções de produtos e outros distintos textos, que contêm uma grande quantidade de informação implícita valiosa e necessitam urgentemente de ferramentas automatizadas para categorizar e processar esses textos multilingues. Este trabalho descreve a subtarefa do Processamento de Linguagem Natural (PNL) conhecida como Classificação de Texto Multilingue (MTC), realizada no contexto da Baidu, uma empresa chinesa líder em IA, cuja equipa de PNL levou a indústria em tecnologia baseada em aprendizagem neuronal a destacar-se em Tradução Automática (MT) e pesquisa científica. A classificação multilingue de textos é um módulo importante na tradução automática de PNL e um módulo básico em tarefas de PNL. A MTC pode ser aplicada a muitos campos, tais como análise de sentimentos multilingues, categorização de notícias, filtragem de conteúdos indesejados (do inglês spam), entre outros. Neste trabalho, iremos primeiro definir o paradigma do modelo AI de 'pré-treino e afinação' em aprendizagem profunda no departamento de PNL da Baidu. Em seguida, realizaremos a pesquisa sobre outros produtos no mercado com capacidade de classificação de texto — a classificação de texto levada a cabo pela Alibaba. Após a pesquisa, verificamos que a maioria dos sistemas de classificação de texto atualmente disponíveis no mercado chinês são concebidos para uma única língua, tal como o sistema de classificação de texto Alibaba. Se os utilizadores precisarem de classificar textos da mesma categoria em várias línguas, precisam de aplicar vários sistemas de classificação de texto para cada língua e depois classificá-los um a um. No entanto, muitos produtos internacionalizados não têm uma única língua de texto, tais como AliExpress comércio eletrónico transfronteiriço, Airbnb B&B business, etc. A indústria precisa compreender e classificar as revisões dos utilizadores em várias línguas. Esta necessidade conduziu a um desenvolvimento aprofundado de estatísticas e estratégias de marketing, e a classificação de textos multilingues é particularmente importante neste cenário. Desta forma, concentrar-nos-emos na interpretação da metodologia do modelo de classificação de texto multilingue da tradução automática no departamento de PNL Baidu. Colhemos para o efeito conjuntos de dados multilingues de comentários e críticas, manchetes de notícias e outros dados para classificação manual, utilizamos os resultados dessa classificação para o aperfeiçoamento do modelo de classificação de texto multilingue e produzimos os dados de avaliação da qualidade do modelo de classificação de texto multilingue da Baidu. Discutiremos se o pré-treino e o aperfeiçoamento do modelo podem melhorar substancialmente a qualidade e o desempenho da classificação de texto multilingue. Finalmente, com base no modelo de classificação de texto multilingue de tradução automática, derivamos o método de aplicação do paradigma de pré-formação e afinação no atual modelo de IA de aprendizagem profunda de ponta sob o sistema de PNL, e verificamos a robustez e os resultados positivos do paradigma de pré-treino e afinação no campo de pesquisa de aprendizagem profunda

    Design and evaluation of rhubarb total free anthraquinones oral colon-specific drug delivery granules to improve the purgative effect

    Get PDF
    Rhubarb is commonly used as a cathartic in Asian countries. However, researchers have devotedextensive concerns to the quality control and safety of rhubarb and traditional Chinese preparations composed of rhubarb due to the instable purgative effect and potential nephrotoxicity of anthraquinones. In this study, we aimed to prepare rhubarb total free anthraquinones (RTFA) oral colon-specific drug delivery granules (RTFA-OCDD-GN) to delivery anthraquinones to colon to produce purgative effect. RTFAOCDD-GN were prepared using chitosan and Eudragit S100 through a double-layer coating process and the formulation was optimized. Continuous release studies were performed in a simulated gastric fluid (pH 1.2), followed by a small-intestinal fluid (pH 6.8) and a colonic fluid (pH 7.4, containing rat cecal contents). The purgative effect test was performed in rats. The dissolution profile of RTFA-OCDD-GN showed that the accumulative dissolution rate of RTFA was about 83.0% in the simulated colonic fluid containing rat cecal contents and only about 9.0% in the simulated gastrointestinal fluids. And the RTFAOCDD-GN could produce the comparative purgative activity as rhubarb, suggesting it could deliver the free AQs to the colon. The RTFA-OCDD-GN was a useful media to enhance the purgative activity of free anthraquinones after administered orally

    Computer-Aided Drug Design of Capuramycin Analogues as Anti-Tuberculosis Antibiotics by 3D-QSAR and Molecular Docking

    Get PDF
    Capuramycin and a few semisynthetic derivatives have shown potential as anti-tuberculosis antibiotics.To understand their mechanism of action and structureactivity relationships a 3D-QSAR and molecular docking studies were performed. A set of 52 capuramycin derivatives for the training set and 13 for the validation set was used. A highly predictive MFA model was obtained with crossvalidated q2 of 0.398, and non-cross validated partial least-squares (PLS) analysis showed a conventional r2 of 0.976 and r2pred of 0.839. The model has an excellent predictive ability. Combining the 3D-QSAR and molecular docking studies, a number of new capuramycin analogs with predicted improved activities were designed. Biological activity tests of one analog showed useful antibiotic activity against Mycobacterium smegmatis MC2 155 and Mycobacterium tuberculosis H37Rv. Computer-aided molecular docking and 3D-QSAR can improve the design of new capuramycin antimycobacterial antibiotics

    5 T Permanent Magnetic Resonance Imaging Device and Its Application for Mouse Imaging

    Get PDF
    By improving the main magnet, gradient, and RF coils design technology, manufacturing methods, and inventing new magnetic resonance imaging (MRI) special alloy, a cost-effective and small animal specific permanent magnet-type three-dimensional magnetic resonance imager was developed. The main magnetic field strength of magnetic resonance imager with independent intellectual property rights is 1.2∼1.5 T. To demonstrate its effectiveness and validate the mouse imaging experiments in different directions, we compared the images obtained by small animal specific permanent magnet-type three-dimensional magnetic resonance imager with that obtained by using superconductor magnetic resonance imager for clinical diagnosis

    Polydatin Prevents Lipopolysaccharide (LPS)-Induced Parkinson's Disease via Regulation of the AKT/GSK3β-Nrf2/NF-κB Signaling Axis

    Get PDF
    Parkinson's disease (PD) is a common neurodegenerative disease characterized by selective loss of dopaminergic neurons in the substantia nigra (SN). Neuroinflammation induced by over-activation of microglia leads to the death of dopaminergic neurons in the pathogenesis of PD. Therefore, downregulation of microglial activation may aid in the treatment of PD. Polydatin (PLD) has been reported to pass through the blood-brain barrier and protect against motor degeneration in the SN. However, the molecular mechanisms underlying the effects of PLD in the treatment of PD remain unclear. The present study aimed to determine whether PLD protects against dopaminergic neurodegeneration by inhibiting the activation of microglia in a rat model of lipopolysaccharide (LPS)-induced PD. Our findings indicated that PLD treatment protected dopaminergic neurons and ameliorated motor dysfunction by inhibiting microglial activation and the release of pro-inflammatory mediators. Furthermore, PLD treatment significantly increased levels of p-AKT, p-GSK-3βSer9, and Nrf2, and suppressed the activation of NF-κB in the SN of rats with LPS-induced PD. To further explore the neuroprotective mechanism of PLD, we investigated the effect of PLD on activated microglial BV-2 cells. Our findings indicated that PLD inhibited the production of pro-inflammatory mediators and the activation of NF-κB pathways in LPS-induced BV-2 cells. Moreover, our results indicated that PLD enhanced levels of p-AKT, p-GSK-3βSer9, and Nrf2 in BV-2 cells. After BV-2 cells were pretreated with MK2206 (an inhibitor of AKT), NP-12 (an inhibitor of GSK-3β), or Brusatol (BT; an inhibitor of Nrf2), treatment with PLD suppressed the activation of NF-κB signaling pathways and the release of pro-inflammatory mediators in activated BV-2 cells via activation of the AKT/GSK3β-Nrf2 signaling axis. Taken together, our results are the first to demonstrate that PLD prevents dopaminergic neurodegeneration due to microglial activation via regulation of the AKT/GSK3β-Nrf2/NF-κB signaling axis

    Dynamics of ammonia oxidizers and denitrifiers in response to compost addition in black soil, Northeast China

    Get PDF
    Organic fertilizer application could have an impact on the nitrogen cycle mediated by microorganisms in arable soils. However, the dynamics of soil ammonia oxidizers and denitrifiers in response to compost addition are less understood. In this study, we examined the effect of four compost application rates (0, 11.25, 22.5 and 45 t/ha) on soil ammonia oxidizers and denitrifiers at soybean seedling, flowering and mature stage in a field experiment in Northeast China. As revealed by quantitative PCR, compost addition significantly enhanced the abundance of ammonia oxidizing bacteria (AOB) at seedling stage, while the abundance of ammonia oxidizing archaea was unaffected across the growing season. The abundance of genes involved in denitrification (nirS, nirK and nosZ) were generally increased along with compost rate at seedling and flowering stages, but not in mature stage. The non-metric multidimensional scaling analysis revealed that moderate and high level of compost addition consistently induced shift in AOB and nirS containing denitrifers community composition across the growing season. Among AOB lineages, Nitrosospira cluster 3a gradually decreased along with the compost rate across the growing season, while Nitrosomonas exhibited an opposite trend. Network analysis indicated that the complexity of AOB and nirS containing denitrifiers network gradually increased along with the compost rate. Our findings highlighted the positive effect of compost addition on the abundance of ammonia oxidizers and denitrifiers and emphasized that compost addition play crucial roles in shaping their community compositions and co-occurrence networks in black soil of Northeast China

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Barrier Thickness and Hydrostatic Pressure Effects on Hydrogenic Impurity States in Wurtzite GaN/AlxGa1−xN Strained Quantum Dots

    No full text
    Within the framework of the effective mass approximation, barrier thickness and hydrostatic pressure effects on the ground-state binding energy of hydrogenic impurity are investigated in wurtzite (WZ) GaN/AlxGa1−xN strained quantum dots (QDs) by means of a variational approach. The hydrostatic pressure dependence of physical parameters such as electron effective mass, energy band gaps, lattice constants, and dielectric constants is considered in the calculations. Numerical results show that the donor binding energy for any impurity position increases when the hydrostatic pressure increases. The donor binding energy for the impurity located at the central of the QD increases firstly and then begins to drop quickly with the decrease of QD radius (height) in strong built-in electric fields. Moreover, the influence of barrier thickness along the QD growth direction and Al concentration on donor binding energy is also investigated. In addition, we also found that impurity positions have great influence on the donor binding energy

    A Novel Miniature Culture System to Screen CO2-Sequestering Microalgae

    No full text
    In this study, a novel 96-well microplate swivel system (M96SS) was built for high-throughput screening of microalgal strains for CO2 fixation. Cell growth under different CO2 supply conditions (0.2, 0.4, 0.8, and 1.2 g L−1 d−1), residual nitrate, and pH value of Chlorella sp. SJTU-3, Chlorella pyrenoidosa SJTU-2, and Scenedesmus obliquus SJTU-3 were examined in the M96SS and traditional flask cultures. The dynamic data showed there was a good agreement between the systems. Two critical problems in miniature culture systems (intra-well mixing and evaporation loss) were improved by sealed vertical mixing of the M96SS. A sample screen of six microalgal species (Chlorella sp. SJTU-3, Chlorella pyrenoidosa SJTU-2, Selenastrum capricornutum, Scenedesmus obliquus SJTU-3, Chlamydomonas sajao, Dunaliella primolecta) was carried out in flasks and the M96SS. Chlamydomonas sajao appeared to be a robust performer (highest cell density: 1.437 g L−1) in anaerobic pond water with 0.8, and 1.2 g L−1 d−1 CO2. The reliability and efficiency of the M96SS were verified through a comparison of traditional flask culture, M96SS, Lukavský’s system, and a microplate shaker
    corecore